Back

Nature Computational Science

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Nature Computational Science's content profile, based on 50 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Unsupervised seizure annotation and detection with neural dynamic divergence

Ojemann, W. K. S.; Xu, Z.; Shi, H.; Walsh, K.; Pattnaik, A. R.; Sinha, N.; Lavelle, S.; Aguila, C.; Gallagher, R.; Revell, A. Y.; LaRocque, J. J.; Korzun, J.; Kulick-Soper, C. V.; Zhou, D. J.; Galer, P. D.; Sinha, S. R.; Shinohara, R.; Davis, K. A.; Litt, B.; Conrad, E. C.

2026-02-17 neurology 10.64898/2026.02.15.26346325 medRxiv
Top 0.1%
17.3%
Show abstract

Annotating seizure onset and spread in intracranial EEG is essential for epilepsy surgical planning, yet manual annotation is unreliable and cannot scale to large datasets. We introduce Neural Dynamic Divergence (NDD), an unsupervised framework that detects seizure activity by measuring deviation from patient-specific baseline neural dynamics using autoregressive models. NDD requires no labeled training data and adapts to individual patients, channels, and brain states. Validating against expert consensus annotations from 46 seizures, NDD achieves human-level agreement ({phi} = 0.58 vs. inter-rater{phi} = 0.64) and outperforms existing algorithms on 1,019 seizures with soft labels (AUROC = 0.87). We demonstrate clinical utility by automatically annotating 2,017 seizures, revealing that seizure spread patterns distinguish epilepsy subtypes and predict surgical outcomes. NDD also generalizes to continuous ICU scalp EEG monitoring (AUROC = 0.77). We provide NDD as an open-source Python package to enable scalable seizure annotation across research centers.

2
Reward-Guided Generation Improves the Scientific Utility of Synthetic Biomedical Data

Jackson, N. J.; Espinosa-Dice, N.; Yan, C.; Malin, B. A.

2026-03-16 health informatics 10.64898/2026.03.11.26348077 medRxiv
Top 0.1%
10.1%
Show abstract

Synthetic data generation is a promising approach for biomedical data sharing and dataset augmentation, yet existing methods lack mechanisms to preserve statistical properties necessary for scientific analysis. To address this, we introduce RLSYN+REG, a reinforcement learning-driven generative model, which encourages that regression models trained on synthetic data reproduce the coefficients and predictions of their real-data counterparts. We evaluate RL-SO_SCPLOWYNC_SCPLOW+RO_SCPLOWEGC_SCPLOW on MIMIC-III and the American Community Survey (ACS) across regression model reproduction, fidelity to real data, and privacy. Synthetic data from RLSO_SCPLOWYNC_SCPLOW+RO_SCPLOWEGC_SCPLOW substantially improves upon that of RLSO_SCPLOWYNC_SCPLOW, raising correlations between real and synthetic regression coefficients from 0.054 to 0.600 on MIMIC-III and from 0.160 to 0.376 on ACS. Predictive performance also improves, reducing the gap between real-data baselines by 81.4% and 97.6% on MIMIC-III and ACS, respectively. These improvements come with negligible cost to fidelity or privacy and are robust to reductions in training data.

3
Integrative Inference of Spatially Resolved Cell Lineage Trees using LineageMap

Pan, X.; Chen, Y.; Zhang, X.

2026-01-22 developmental biology 10.64898/2026.01.19.700383 medRxiv
Top 0.1%
9.9%
Show abstract

Understanding the spatio-temporal processes of tissue growth, including how new cell types emerge and how cells form the tissue architecture, is a fundamental problem in biology. The emerging spatially resolved lineage tracing data, where three modalities, lineage barcodes, gene expression profiles, and spatial locations, are measured for each single cell, provides an unprecedented opportunity to understand these processes. Computational methods that take advantage of all three modalities to reconstruct cell lineage tree and ancestral cell states and locations are needed. We introduce LineageMap, a hybrid lineage inference algorithm that integrates the scalability of distance-based tree reconstruction methods with the flexibility of likelihood-based methods under a unified probabilistic framework. The input to LineageMap is spatially resolved lineage tracing data, where for each single cell, the gene expression, lineage barcode and spatial locations are available. LineageMap enables accurate, interpretable, and scalable inference of high-resolution lineage trees as well as locations of ancestral cells from the tri-modality single-cell data. Across simulated and experimental datasets, LineageMap consistently outperforms existing methods in the accuracy of reconstructed cell lineage trees, while revealing biologically coherent spatiotemporal trajectories. Our framework bridges molecular lineage tracing with spatial and transcriptomic information, advancing computational reconstruction of dynamic cellular ancestries in both time and space. LineageMap is available at: https://github.com/ZhangLabGT/LineageMap.

4
DynMoCo: a Novel AI Framework to Reveal Modular Substructures of Protein From Molecular Dynamics

Mao, L.; Kwak, M.; Ashkezari, A. H. K.; Li, Z.; Chen, Y.; Cong, P.; Phee, J. H.; Kang, S.; Li, J.; Zhu, C.

2026-02-10 biophysics 10.64898/2026.02.08.704355 medRxiv
Top 0.1%
9.8%
Show abstract

Proteins are dynamic molecular machines whose functions are determined by their structures. While static structures can offer initial insights or hypotheses about protein function, they are often insufficient for a detailed mechanistic understanding. Molecular dynamics (MD) simulations provide atomistic view of proteins dynamic motion and conformational change, but the resulting high-dimensional data are challenging to interpret. Traditional summary statistics and dimensionality-reduction methods often focus on global motions and can overlook regional, yet functionally critical motions. Inspired by approaches from social network science, we introduce a novel perspective for analyzing MD simulations through dynamic community detection, where molecules are modeled as time-evolving graphs, and communities of residues or atoms that move coherently or exhibit functional coupling are identified. We present DynMoCo, a novel deep learning framework that integrates graph convolutional networks with recurrent models for end-to-end dynamic community detection on molecular graphs. Given a MD trajectory, DynMoCo identifies spatially grounded substructures, tracks their evolution over time, and can incorporate structural knowledge to ensure physically meaningful communities. We provide a library of custom-written scripts to allow users to extract and visualize these communites on the MD simulated molecules in motion. We demonstrate the method on force-ramp and force-clamp steered MD simulations of three integrin systems, revealing modular substructures within known domains and characterizing their conformational rearrangements during force-induced unbending. By reducing high-dimensional MD data into interpretable communities, this approach offers new insights into the intrinsic organization and dynamic function of complex biomolecular systems. SIGNIFICANCEProteins often perform their functions through dynamic, locally coordinated motions. Molecular dynamics simulations provide detailed views of these motions but produce high-dimensional data that are challenging to analyze and interpret. We present a novel deep learning model that analyzes molecular dynamics simulations data and identifies structurally coherent and potentially functionally related communities, while tracking their temporal evolution. This analysis tool provides a novel way to analyze MD data transforming it into interpretable representations of modular dynamic, enabling discovery of new mechanistic insights and advancing our understanding of how molecular motions drive biological function.

5
From Prefix to Path: Learning Temporally Consistent Biomolecular Dynamics from Limited Initial Data

Choudhuri, S.; Adhikari, S.; Mondal, J.

2026-03-05 biophysics 10.64898/2026.03.02.709204 medRxiv
Top 0.1%
9.8%
Show abstract

Molecular dynamics (MD) simulations provide detailed insights into biomolecular motion but are often limited by the prohibitive cost of sampling long-timescale behavior. Here, we present a Transformer-based framework that reconstructs temporally continuous dynamical trajectories from only a small fraction of the initial data, directly targeting time-ordered evolution rather than independent ensemble snapshots. Using three systems spanning distinct dynamical regimes (intrinsically disordered -Synuclein, Cytochrome P450 ligand-binding motion, and a synthetic three-well potential), we show that the model learns both local fluctuations and long-range temporal structure. At inference time, the model generates full trajectories autoregressively from an initial prefix as prompt, capturing metastable transitions, basin-to-basin movements, and system-specific dynamical signatures. Free-energy surfaces computed from generated trajectories closely match ground-truth landscapes and, in several cases, we observe enhanced sampling in generated trajectories relative to the trained trajectories--while preserving kinetically meaningful transition patterns. These results demon-strate that Transformer architectures can serve as efficient, system-agnostic tools for time-continuous molecular trajectory prediction, offering a data-driven complement to long MD simulations and enabling accelerated exploration of conformational space.

6
Contrastive learning for antibody-antigen sequence-to-specificity prediction

Lee, H.; Castro, K.; Renwick, S.; Stalder, L.; Glanzer, W.; Kumar, R.; Chen, N.; Scheck, A.; Yermanos, A.; Mason, D.; Reddy, S. T.

2026-02-26 immunology 10.64898/2026.02.25.707916 medRxiv
Top 0.1%
8.4%
Show abstract

Predicting which antibodies bind to which antigens directly from primary amino acid sequences remains a major challenge, as no current method can reliably determine this specificity at both a repertoire and proteome scale. Structure-based protein design frameworks can propose antibody binders to specified antigenic epitopes, but they do not solve the "sequence-to-specificity" task of mapping antibodies to cognate epitopes, and vice versa. Here, we introduce CALM (Cross-attention Adaptive Immune Receptor-Antigen Language Model), a dual-encoder plus cross-attentive decoder architecture that treats antibody-antigen recognition as molecular translation. Using contrastive learning, antigen and antibody encoders learn a shared embedding space that aligns cognate epitope-paratope binding pairs. CALM-1.0 is trained and evaluated on 4,138 curated antibody-antigen pairs obtained from the PDB-derived structural antibody database (SAbDab). On a leakage-controlled test split drawn from sequence clusters at 80% identity and unseen during training, CALM-1.0 achieves a mean top-1 retrieval (R@1) of 7%, with consistent performance across both directions (Ab[->]Ag and Ag[->]Ab). CALM establishes a foundation for bidirectional antibody-antigen sequence-to-specificity prediction with the potential to unify retrieval and generative design.

7
Sequence Design and Phylogenetic Inference with Generative Flow Networks

Huang, Q.; Mourra-Diaz, C. M.; Wen, X.; Payette, D.

2026-04-09 synthetic biology 10.64898/2026.04.08.717239 medRxiv
Top 0.1%
8.3%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWPhylogenetic inference remains computationally challenging due to the exponentially growing tree topology search space, and current methods rely heavily on multiple sequence alignments (MSAs) which are expensive and error-prone. We propose AncestorGFN, a proof-of-concept approach leveraging Generative Flow Networks (GFlowNets) for simultaneous sequence generation and phylogenetic exploration without requiring explicit MSAs. Our method learns to generate sequences matching a target distribution while the flow trajectories implicitly encode structural relationships among sequences. We demonstrate that greedy traceback on maximum-flow trajectories recovers shared intermediate states suggestive of common ancestry, and evaluate on the let-7 microRNA family where the learned flow structure qualitatively captures phylogenetic branching patterns. Furthermore, beam search at inference time discovers novel sequences clustering near known targets, suggesting applications in de novo sequence design. This work establishes an initial foundation for alignment-free phylogenetic exploration using generative models.

8
Dynamic Compression Flows for Neuroscience Data

Wei, G.; Albuquerque, D. d.; Martinez, M.; Pan, S.; Pearson, J.

2026-02-13 neuroscience 10.64898/2026.02.12.705535 medRxiv
Top 0.1%
6.9%
Show abstract

While neuroscience experiments have repeatedly demonstrated the involvement of large populations of neurons in even simple behaviors, these studies have just as often reported that the collective dynamics of neural activity are approximately low-dimensional. As a result, methods for identifying low-dimensional latent representations of time series data have become increasingly prominent in neuroscience. However, most existing methods either ignore temporal structure or model time evolution using latent dynamical systems approaches. In the first case, dynamics may be distorted or even scrambled in the latent space, while in the second, many possible latent dynamics may give rise to the same data. Here, we address these challenges using a novel flow-matching approach in which data are generated by a pair of flow fields, one governing time evolution, the other a mapping between data and a low-dimensional latent space. Importantly, the dimension-reducing flow is trained to minimize distortions of the temporal dynamics, learning an identifiable low-dimensional representation that preserves temporal relations in the original data. Additionally, we constrain our latent spaces to have low-dimensional support in a soft, parameterized manner, taking inspiration from ideas on nested dropout. Across both neural and behavioral data, we show that this dual flow approach produces both more interpretable dynamics and higher-quality reconstructions than competing models, including in noise-dominated data sets where conventional approaches fail.

9
LLM-Evolved Regularization Schedules Prevent Posterior Collapse in Latent Factor Analysis via Dynamical Systems

Knight, J.

2026-02-12 neuroscience 10.64898/2026.02.10.705076 medRxiv
Top 0.1%
6.8%
Show abstract

Latent Factor Analysis via Dynamical Systems (LFADS) is a powerful variational autoencoder for inferring neural population dynamics from spike train data. However, LFADS suffers from pos-terior collapse, where the learned posterior collapses to the prior, eliminating meaningful latent representations. Current solutions require computationally expensive Population-Based Training (PBT) to dynamically tune regularization hyperparameters. Here, we demonstrate that Large Lan-guage Model (LLM)-based program evolution can discover regularization schedules that prevent posterior collapse without PBT. Using FunSearch, an evolutionary algorithm that uses LLMs to generate and refine Python functions, we evolved adaptive regularization schedules that respond to training dynamics. Our best evolved schedule prevents posterior collapse across all tested conditions, maintaining KL divergence 6.5x higher than baseline schedules at 50 epochs (n = 10 seeds each, p < 0.001) and stable above 0.09 through 500 epochs across three Neural Latents Benchmark datasets, while preserving reconstruction quality. This work represents the first application of LLM-based program synthesis to variational autoencoder hyperparameter scheduling, offering a computationally efficient alternative to population-based optimization.

10
PrivateBoost: Privacy-Preserving Federated Gradient Boosting for Cross-Device Medical Data

Specht, B.; Garbaya, S.; Ermis, O.; Schneider, R.; Chavarriaga, R.; Khadraoui, D.; Tayeb, Z.

2026-03-10 health informatics 10.64898/2026.02.10.26345891 medRxiv
Top 0.1%
6.5%
Show abstract

Cross-device medical federated learning where individual patients participate directly rather than institutions poses a unique challenge: each client holds only a few samples, often just one (e.g., a single diagnostic record), leaving insufficient local data for gradient computation. Existing approaches, such as Secure Aggregation, require client-to-client coordination impractical for intermittently available mobile devices, while homomorphic encryption-based alternatives introduce sophisticated key management and coordination requirements ill-suited to dynamic cross-device deployments. We present privateboost, a federated XGBoost system that addresses this setting through m-of-n Shamir secret sharing with commitment-based anonymous aggregation. Clients distribute shares to a fixed set of shareholders requiring no client-to-client communication and the aggregator reconstructs only aggregate gradient sums via Lagrange interpolation, never observing individual values or client identities. We evaluate on UCI medical datasets, demonstrating 98% split gain retention relative to centralized XGBoost and accuracy resilient to up to 80% client dropout.

11
Ollivier Ricci Curvature as a Geometric Biomarker for Biomedical Networks: From Ontology to Comorbidity Aging Trajectories

Agourakis, D. C.; Gerenutti, M.

2026-03-16 health informatics 10.64898/2026.03.14.26348393 medRxiv
Top 0.1%
6.4%
Show abstract

Network geometry offers a principled lens for understanding the structure of biomedical knowledge. We apply exact Ollivier-- Ricci curvature (ORC) -- a discrete analogue of Riemannian curvature computed via optimal transport -- to medical ontologies, disease comorbidity networks, biological interaction networks, and brain functional connectivity graphs. Three main results emerge. First, within a single database (the Human Phenotype Ontology), the formal IS-A taxonomy is hyperbolic ([Formula], tree-like), while the disease co-occurrence network is spherical ([Formula], clique-rich) -- a six-order-of-magnitude gap in the density parameter that the curvature phase transition framework predicts without free parameters. Second, age-stratified disease comorbidity networks from 8.9 million Austrian hospital patients reveal a geometric aging trajectory: mean ORC increases monotonically from [Formula] (age 20-30) to [Formula] (age 80+), driven by rising clustering and density that encode the accumulation of multimorbidity. Third, sedenion ([R]16) Mandel-brot orbit features -- exploiting the zero-divisor structure of the Cayley-Dickson tower -- discriminate ASD-like from ADHD-like brain network topology (AUROC = 0.990, sedenion-only), providing complementary geometric information to ORC. Canonical biological networks (C. elegans neural, E. coli gene regulatory, protein-protein interaction) are uniformly spherical, suggesting that evolved biological networks universally favour redundant, triangle-rich connectivity. All core mathematical claims are machine-verified in Lean 4 (0 sorry in 7 core modules). These results establish ORC as a quantitative geometric biomarker for biomedical network analysis and demonstrate that the same phase transition framework governing semantic networks extends to clinical and biological domains.

12
Vision-language framework for multi-sequence brain magnetic resonance imaging

Lteif, D.; Jia, S.; Bit, S.; Kaliaev, A.; Mian, A. Z.; Small, J. E.; Mangaleswaran, B.; Plummer, B. A.; Bargal, S. A.; Au, R.; Kolachalama, V. B.

2026-04-04 radiology and imaging 10.64898/2026.03.30.26349106 medRxiv
Top 0.1%
6.4%
Show abstract

Structural magnetic resonance imaging (MRI) is a cornerstone for diagnosing neurological disorders, yet automated interpretation of multi-sequence brain MRI remains limited by challenges in cross sequence reasoning and protocol variability. Here we present ReMIND, a vision-language modeling framework tailored for comprehensive multi-sequence and multi volumetric brain MRI analysis. Trained on over 73,000 deidentified patient visits encompassing more than 850,000 MRI sequences paired with radiology reports from diverse clinical and research cohorts, ReMIND combined large scale instruction tuning on more than one million clinically grounded question answer (QA) pairs with targeted supervised fine-tuning for radiology report generation. At inference, ReMIND employed modality aware reranking and correction, a report level decoding strategy that suppressed unsupported modality claims while preserving linguistic fluency and clinical coherence. Cross-cohort generalization was maintained on independent external datasets from different institutions. These findings represent an advance toward consistent and equitable brain MRI interpretation, meriting prospective evaluation to support diagnosis and management of neurological conditions.

13
Physically Grounded Generative Modeling of All-Atom Biomolecular Dynamics

Feng, B.; Zhang, J.; Zhang, X.; Zhang, M.; Barth, P.; Liu, Z.; Li, Y.

2026-02-15 bioinformatics 10.64898/2026.02.15.705956 medRxiv
Top 0.1%
6.3%
Show abstract

Predicting the kinetic pathways of biomolecular systems at all-atom resolution is crucial for understanding protein function and drug efficacy, yet this task is hindered by the immense computational cost of conventional molecular dynamics (MD) simulations. While deep learning has revolutionized static structure prediction and equilibrium ensemble sampling, simulating the kinetics of conformational transitions remains a critical challenge. We introduce BioKinema, a physically grounded generative model that predicts continuous-time, all-atom biomolecular trajectories at a fraction of the cost of traditional simulations. In particular, BioKinema utilizes a scalable diffusion architecture with temporal attention mechanisms derived from Langevin dynamics. It employs a hierarchical forecasting-and-interpolation strategy to overcome the error accumulation that often plagues long-horizon generation. Through extensive validation, we demonstrate that BioKinema generates physically stable and dynamically accurate trajectories suitable for rigorous downstream analysis. The model captures key conformational transitions related to protein function. For protein-ligand complex systems, it successfully elucidates mechanisms such as induced-fit conformational changes and allosteric responses. Furthermore, BioKinema leverages enhanced sampling data to predict rare kinetic events, emerging as a powerful tool for estimating ligand unbinding pathways. Collectively, these results establish BioKinema as a robust alternative to MD that bridges the gap between static structure and dynamic function, enabling high-throughput exploration of the kinetic landscape for structural biology and drug discovery.

14
A Connectome-Constrained Jansen-Rit Framework for Inferring Cortical Gain Control and Ensemble Stability

Diaconescu, A. O.; Wang, Z.; Griffiths, J. D.

2026-01-31 neuroscience 10.64898/2026.01.28.702100 medRxiv
Top 0.1%
6.3%
Show abstract

Understanding how local circuit dynamics give rise to large-scale stability and instability of brain activity is a central challenge in computational neuroscience, with direct relevance for disorders characterized by disrupted excitatory-inhibitory balance, including schizophrenia spectrum disorder (SSD). Here, we introduce a principled methodology for recovering local neural parameters and low-dimensional dynamical biomarkers from a connectome-constrained Jansen-Rit (JR) neural mass model using variational free-energy inversion and sliding-window analysis. Each cortical region is modeled as a canonical excitatory-inhibitory microcircuit embedded within a whole-brain network whose long-range interactions are factorized into pyramidal-pyramidal, pyramidal- excitatory, and pyramidal-inhibitory subnetworks. Across 80 independent simulations, the inversion framework reliably recovered both microcircuit parameters and emergent biomarkers derived from neural states, including the mean-variance slope ({beta}1), its spatial variability, and the lag-1 autocorrelation ({rho}1). These quantities capture complementary aspects of cortical ensemble dynamics--gain sensitivity, regional heterogeneity, and temporal persistence associated with proximity to criticality--and were consistently estimated with minimal bias and high reliability. The recovered slope hierarchy [Formula] revealed an interpretable gain-control architecture in which inhibitory channels regulate damping, excitatory channels gate resonance, and pyramidal populations integrate network drive into stable output. Together, these results demonstrate that the JR model provides a tractable and biophysically grounded framework for linking synaptic parameters, network structure, and ensemble-level stability. Although motivated by questions surrounding psychosis risk and SSD, the proposed approach is general and establishes a foundation for future applications in model-based inference, network control, and adaptive neuromodulation.

15
VDJdive and ECLIPSE enhance single-cell TCR sequencing analysis through the probabilistic resolution of ambiguous clonotypes

Burns, E. C.; Movassagh, M.; Lundell, J. F.; Ye, M.; Ye, Z.; Oliveira, G.; Rout, R.; Hugaboom, M. B.; Street, K.; Braun, D. A.

2026-02-20 immunology 10.64898/2026.02.18.706444 medRxiv
Top 0.1%
6.2%
Show abstract

Single-cell T cell receptor sequencing (scTCR-seq) has transformed our ability to track individual T cell clones and has been instrumental in advancing our understanding of human T cell differentiation. However, current computational pipelines for analysis, which require precise matching of CDR3 sequences from exactly 2 heterodimeric TCR chains to define the clonotype for each cell, are inherently limited because of the substantial proportion of cells possessing "ambiguous" clonotypes driven by missing (undetected from the technical issue of chain "dropout") or extra chains (present from either true biological expression or due to technical artifacts such as cellular doublets and ambient TCR contamination). As a result, clone sizes are artificially reduced, impeding the tracking of clones across conditions and differentiation states. Here we introduce VDJdive and ECLIPSE (Enhanced CLonotypic Inference via Prediction of Single-cell Expression), two computational methods that, together, resolve this clonal ambiguity by utilizing the expectation-maximization algorithm for the clonal prediction of ambiguous cells. These methods consider chain pairings across the sample, allowing for high-fidelity prediction of chains lost due to dropout and the discernment of biological expression of extra chains from technical artifacts. Consequently, clone sizes are augmented and cells without clonotype assignments are minimized. Our approach facilitates enhanced clonal tracking through these elevated clone sizes and is easily implementable, compatible with standard single-cell transcriptomic workflows, and broadly applicable across biological contexts and T cell subsets.

16
HHBayes: A Flexible Bayesian Framework for Simulating and Analyzing Household Transmission Dynamics

Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.

2026-04-03 infectious diseases 10.64898/2026.04.01.26349903 medRxiv
Top 0.1%
5.1%
Show abstract

Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.

17
Quantitative extrapolation from single-tags (QuEST) immunofluorescence microscopy to derive TCR signalosome stoichiometries in human primary T cells

Fei, P.; Dustin, M. L.

2026-03-31 immunology 10.64898/2026.03.28.715001 medRxiv
Top 0.1%
4.9%
Show abstract

Upon T cell receptor (TCR) engagement, a T cell forms an immunological synapse (IS) with an antigen-presenting cell (APC), which can be mimicked by purified ligands on supported lipid bilayers (SLBs)1,2. Microvilli actively scan the surface; upon initial engagement, F-actin-dependent TCR microclusters form, and the central supramolecular activation cluster (cSMAC) sustains TCR signaling in CD8 T cells3,4. Although signaling activities within the IS have been observed qualitatively through total internal reflection immunofluorescence microscopy5-7, the stoichiometric relationships among the components of the TCR signalosome remain unknown. In this study, we employed a two-step approach to quantify the components of the TCR signalosome. First, Jurkat cell lines expressing GFP-tagged proteins on a knockout background were used to calibrate fluorescence intensity (IF) signals against molecular copy numbers, based on measurements of single-tag signals and multiple corrections. In the second step, this calibration was applied to determine the stoichiometries of key TCR signalosome components, including TCR, CD8, CD28, CD45, PD-1, Lck, ZAP-70, LAT, and PLC{gamma}1, across scanning, early activation, and sustained activation states in human primary T cells. We refer to the method as quantitative extrapolation from single-tags (QuEST) immunofluorescence microscopy. Applying the QuEST, we were surprised to find that the ZAP-70:TCR ratio in microclusters and the cSMAC was 1:1, far from the potential 10:1 ratio. Nanoscale structures of the TCR signalosome were further captured using direct stochastic optical reconstruction microscopy (dSTORM), confirming that ZAP-70 was strongly co-localized with the TCR. Moreover, we applied QuEST to confirm the presence of T cell intrinsic CD28 recruitment, independent of CD80 or CD86 on SLBs, during TCR activation. This T cell intrinsic CD28 recruitment could be disrupted through engagement of PD-1 with PD-L1 on SLBs. This shows that PD-1 engagement can disrupt T cell intrinsic CD28 costimulation. QuEST provides a broadly applicable pipeline for quantitative analysis of TCR signalosomes in human primary cells, enabling a quantitative basis for the rational manipulation and engineering of the TCR signalosome in immunotherapies.

18
Multiscale conformational sampling of multidomain fusion proteins by a physics informed diffusion model

Su, Z.; Wang, B.; Wu, Y.

2026-03-13 bioinformatics 10.64898/2026.03.11.711061 medRxiv
Top 0.1%
4.9%
Show abstract

Multidomain fusion proteins, such as bispecific antibodies, rely on highly flexible linker regions for their therapeutic efficacy. Characterizing these vast conformational ensembles is crucial for rational drug design; however, while all-atom molecular dynamics (MD) is the traditional gold standard, its immense computational cost makes simulating large-scale domain motions prohibitive. Recently, deep generative diffusion models have emerged as a rapid alternative for sampling protein dynamics. Yet, being trained primarily on massive databases of structured, static domains, these generic models often lack the biophysical constraints required to thoroughly sample the large-scale dynamics of highly flexible multidomain architectures. To overcome this, we leverage microsecond MD trajectories of a multidomain protein construct with various linkers to train a multiscale diffusion framework utilizing an Equivariant Graph Neural Network (EGNN). To efficiently model the dynamics of the large molecular complexes, we employ a coarse-grained spatial graph that condenses rigid domains into center-of-mass anchors while preserving explicit backbone resolution for the flexible linker. By further integrating foundational rules in biophysics directly into both the training objective and the inference process, our model generates high-fidelity conformational ensembles that reproduce the thermodynamic distributions of long-timescale MD. This physics-informed approach provides a mathematically stable, highly scalable platform for the rapid multiscale characterization of flexible biologics, significantly accelerating the rational design of fusion protein therapeutics.

19
RePaRank: An Efficient Architecture for Antibody-Antigen Interface Prediction by Proximity Ranking

Bednarek, J.; Janusz, B.; Krawczyk, K.

2026-03-05 immunology 10.64898/2026.03.03.708462 medRxiv
Top 0.1%
4.9%
Show abstract

The prediction of protein-protein interactions is central to structural biology, yet leading models are often computationally expensive, creating an accessibility gap for many high-throughput applications. Furthermore, common evaluation metrics such as binary contact prediction can be unreliable. In this work, we address both challenges. We introduce RePaRank, a computationally efficient deep learning architecture with 39.4 million parameters that predicts antibody-antigen interfaces by reframing the problem as a proximity ranking task in a learned embedding space. We also propose the Precision AUC, a robust, ranking-based metric that provides a more stable assessment of model performance than traditional binary methods. Our experiments show that RePaRank consistently outperforms benchmark models in paratope prediction and is highly competitive in epitope prediction among models that do not require external resources such as Multiple Sequence Alignments (MSA). RePaRank offers a practical and powerful tool for the immunoinformatics community.

20
A generalized synthetic control algorithm for sparse functional data

Shao, L.; Pohl, K. M.; Thompson, W. K.

2026-02-25 neuroscience 10.64898/2026.02.23.707582 medRxiv
Top 0.1%
4.9%
Show abstract

The Synthetic Control Method (SCM) and its interactive factor model generalizations (GSC) are powerful for estimating causal effects from panel data but are not easily applied when follow-up is irregular or sparse, common features of biomedical cohorts. We develop a Bayesian functional extension of GSC that treats each units outcome path as a smooth latent trajectory and accommodates unequally spaced measurements. Trajectories are approximated using Functional Principal Components Analysis (FPCA), providing a data-driven basis that captures dominant patterns with minimal shape assumptions while borrowing strength across individuals. Within this representation, we learn unit and time latent factors jointly with FPCA scores from the control data, construct counterfactual trajectories for treated units, and quantify uncertainty via the posterior. Identification relies on a latent-factor/weak-trend condition and overlap of controls and treated units in the functional score space. Simulation studies varying donor pool and treated unit size and sampling density show that the proposed approach (a.k.a GSC-FPCA) yields low bias when sampling is irregular or sparse, with well-calibrated interval coverage across a broad range of scenarios. We apply the method to longitudinal neuroimaging data from the National Consortium on Alcohol and Neurodevelopment in Adolescence - Adulthood (NCANDA-A) study to estimate the effect of adolescent binge drinking on subsequent brain volumes. Leveraging from 1 to 9 observed time points per participant, GSC-FPCA produces stable counterfactuals and detects a negative impact on gray-matter volumes with sustained high levels of binge drinking. Our results demonstrate that embedding GSC within a functional framework enables robust causal inference in biomedical applications characterized by irregularly-spaced visits, limited observations, and complex outcome dynamics.